Random subspace method for multivariate feature selection

نویسندگان

  • Carmen Lai
  • Marcel J. T. Reinders
  • Lodewyk F. A. Wessels
چکیده

In a growing number of domains the data collected has a large number of features. This poses a challenge to classical pattern recognition techniques, since the number of samples often is still limited with respect to the feature size. Classical pattern recognition methods suffer from the small sample size, and robust classification techniques are needed. In order to reduce the dimensionality of the feature space, the selection of informative features becomes an essential step towards the classification task. The relevance of the features can be evaluated either individually (univariate approaches), or in a multivariate manner. Univariate approaches are simple and fast, therefore appealing. However, possible correlation and dependencies between the features are not considered. Therefore, multivariate search techniques may be helpful. Several limitations restrict the use of multivariate searches. First, they are prone to overtraining, especially in p n (many features and few samples) settings. Secondly, they can be computationally too expensive when dealing with a large feature space. We introduce a new multivariate search technique that is less sensitive to the noise in the data and computationally feasible as well. We compare our approach with several multivariate and univariate feature selection techniques, on an artificial dataset which provides us with ground truth information. The results show the importance of multivariate search techniques and the robustness and reliability of our new algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RANDOM SUBSPACE METHOD IN CLASSIFICAnON AND MAPPING OF FMRI DATA PATTERNS by

RANDOM SUBSPACE METHOD IN CLASSIFICATION AND MAPPING OF FMRI DATA PATTERNS Tianwen Chen, PhD George Mason University, 2011 Dissertation Director: Dr. Daniel E. Houser The functional magnetic resonance imaging (fMRI) technique is widely used in studying human brain functions. It measures brain activities both spatially and temporally. The past decade has witnessed a growing interest in the fMRI ...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Audio Genre Classification with Semi-Supervised Feature Ensemble Learning

Widespread availability and use of music have made automated audio genre classification an important field of research. Thanks to feature extraction systems, not only music data, but also features for them have become readily available. However, handlabeling of a large amount of music data is time consuming. In this study, we introduce a semi-supervised random feature ensemble method for audio ...

متن کامل

Selection of Relevant and Non-Redundant Feature Subspaces for Co-training

On high dimensional data sets choosing subspaces randomly, as in RASCO (Random Subspace Method for Co-training, Wang et al. 2008) algorithm, may produce diverse but inaccurate classifiers for Co-training. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. First algorithm relevant random subspaces (Rel-RASCO) p...

متن کامل

Weighted random subspace method for high dimensional data classification.

High dimensional data, especially those emerging from genomics and proteomics studies, pose significant challenges to traditional classification algorithms because the performance of these algorithms may substantially deteriorate due to high dimensionality and existence of many noisy features in these data. To address these problems, pre-classification feature selection and aggregating algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2006